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Abstract 

We  are  interested  in  capturing  time  series  generated  by 
small  wireless  electronic  sensors.  Battery-operated  sensors 
must  avoid  heavy  use  of  their  wireless  radio  which  is  a  key 
cause  of  energy  dissipation.  When  many  sensors  transmit, 
the  resources  of  the  recipient  of  the  data  are  taxed;  hence, 
limiting  communication  will  benefit  the  recipient  as  well.  In 
our  paper  we  show  how  time  series  generated  by  sensors 
can  be  captured  and  stored  in  a  database  system  (archive). 
Sensors  compress  time  series  instead  of  sending  them  in  raw 
form.  We  propose  an  optimal  on-line  algorithm  for  con¬ 
structing  a  piecewise  constant  approximation  (PCA)  of  a 
time  series  which  guarantees  that  the  compressed  represen¬ 
tation  satisfies  an  error  bound  on  the  La a  distance.  In  addi¬ 
tion  to  the  capture  task,  we  often  want  to  estimate  the  val¬ 
ues  of  a  time  series  ahead  of  time,  e.g.,  to  answer  real-time 
queries.  To  achieve  this,  sensors  may  fit  predictive  models 
on  observed  data,  sending  parameters  of  these  models  to 
the  archive.  We  exploit  the  interplay  between  prediction  and 
compression  in  a  unified  framework  that  avoids  duplicating 
effort  and  leads  to  reduced  communication. 


1.  Introduction 

Data  generated  by  small  wireless  electronic  sensors  are 
increasingly  significant  for  emerging  applications  [14,  9, 
24].  Sensors  are  becoming  smaller,  cheaper  and  more  con¬ 
figurable  [24],  Current  and  future  sensor  designs  routinely 
include  a  fully  programmable  CPU,  a  local  memory  buffer 
and  a  wireless  radio  for  communication  [24,  16].  Sen¬ 
sors  must  be  treated  as  equal  partners  in  future  distributed 
database  systems  as  they  can  store,  manipulate  and  commu¬ 
nicate  information. 

1.1.  The  Time  Series  Capture  Task 

In  our  paper  we  are  interested  in  capturing  sensor¬ 
generated  time  series.  Each  sensor,  or  data  producer  gener¬ 
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ates  a  series  of  values  of  some  measured  attribute,  e.g.,  tem¬ 
perature.  Sending  these  raw  values  to  the  data  archiver  (a 
database  system)  uses  up  the  limited  communication  band¬ 
width  [23,  16]  and  causes  energy  drain  [24,  20].  If  multi¬ 
ple  sources  of  information  are  involved,  bandwidth  at  the 
archiver  end  may  be  limited  as  well  [23],  Even  if  all  infor¬ 
mation  can  be  received,  it  may  be  too  difficult  to  process 
if  the  rate  of  data  generation  is  high  [2,  30].  Obviously, 
limiting  communication  in  a  system  involving  sensors  will 
benefit  all  involved  parties. 

We  assume  that  some  loss  of  precision  in  the  archived 
version  of  the  time  series  can  be  tolerated  if  this  helps  re¬ 
duce  communication.  We  do  not  want,  however,  unbounded 
inaccuracy  in  the  stored  imprecise  series.  Besides  the  cap¬ 
ture  task,  time  series  values  may  be  needed  ahead  of  time 
by  real-time  applications,  e.g.,  queries.  Such  applications 
and  the  capture  task  must  gracefully  co-exist. 

We  observe  that  time  series  values  are  not  entirely  ran¬ 
dom  and  can  thus  be  compressed.  This  implies  that  some 
number  of  samples  must  be  accumulated,  since  compres¬ 
sion  exploits  the  redundancy  of  information  across  many 
samples;  the  sensor  must  see  some  samples,  compress  them 
and  forward  the  compressed  representation  to  the  archiver. 

Propagating  messages  from  the  sensor  to  the  archiver 
takes  time.  Hence,  any  application  that  requires  knowledge 
of  recent,  present  or  future  time  series  values  must  wait  for 
these  to  arrive.  This  time  will  be  longer  if  samples  are  not 
forwarded  immediately  but  are  rather  compressed.  To  ad¬ 
dress  this  issue,  sensors  are  tasked  with  fitting  parametric 
predictive  models  of  the  time  series,  sending  parameters  of 
these  models  to  the  archive.  Using  these,  values  of  the  time 
series  can  be  estimated  ahead  of  time,  reducing  the  latency 
seen  by  applications. 

1.2.  Why  is  Capturing  Time  Series  Important? 

Many  applications  over  sensors  are  primarily  motivated 
by  their  ability  to  monitor  the  physical  world  in  real-time.  In 
many  situations  sensor  data  is  useful  not  only  for  its  present 
utility  for  some  application,  but  for  its  potential  future  util- 
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ity  as  well.  Therefore,  capturing  the  complete  history  of  a 
time  series  is  essential  for  systems  incorporating  sensors. 
This  contrasts  somewhat  with  the  emerging  paradigm  of 
rapidly  produced  data  streams  whose  focus  is  not  primar¬ 
ily  on  storage  [20,  2] . 

For  example,  sensors  will  often  be  used  in  large-scale 
scientific  experiments.  Such  experiments,  often  involving 
changing  behavior  (e.g.,  the  diffusion  of  pollutants  in  a 
water  stream),  over  long  periods  of  time  cannot  be  accu¬ 
rately  studied  unless  one  stores  the  entire  history  of  the  phe¬ 
nomenon.  In  some  cases,  e.g.,  major  earthquakes,  environ¬ 
mental  disasters,  volcano  eruptions,  the  studied  process  is 
rare  and  hence  the  value  of  data  collected  about  it  is  signif¬ 
icant. 

In  a  second  example,  consider  an  intrusion  detection  sys¬ 
tem  relying  on  sound  and  light  intensity  measuring  devices. 
A  torch-carrying  intruder  may  set  off  an  alarm,  but  it  is  con¬ 
ceivable  that  the  real-time  application  may  be  misguided 
into  not  raising  an  alarm.  The  next  day,  when  the  intru¬ 
sion  is  detected,  e.g.,  a  precious  item  is  missing,  it  would  be 
useful  to  “rewind"  the  time  series  produced  by  the  system’s 
sensors,  and  try  to  identify  traces  of  the  intrusion. 

We  view  time  series  generated  by  sensors  as  a  commod¬ 
ity,  besides  its  real-time  usefulness.  Our  work  is  part  of  the 
Quality-Aware  Sensor  Architecture  (QUASAR)  project  at 
UC  Irvine,  which  aims  to  create  a  general  architecture  for 
different  sensor-based  applications,  both  on-line  and  off¬ 
line,  both  current  and  future.  This  differs  from  a  com¬ 
monplace  view  in  which  sensor-based  applications  are  built 
from  scratch  with  a  single  objective  (e.g.,  real-time  mon¬ 
itoring)  without  accounting  for  unforeseen  uses  of  sensor¬ 
generated  data. 

1.3.  Paper  Organization 

In  Section  2  we  formulate  our  problem  and  sketch  the 
proposed  solution.  In  Section  3  we  consider  compression 
with  quality  guarantees.  In  Section  4  we  motivate  the  need 
for  prediction,  show  how  it  can  be  performed,  and  how  it 
can  co-exist  with  compression.  In  Section  5  we  evaluate 
our  techniques  experimentally.  In  Section  6  we  cover  some 
related  work,  and  in  Section  7  we  summarize  our  work  and 
present  future  research  directions. 

2.  Problem  Formulation 

We  will  speak  of  a  single  data  producer  (sensor)  and  data 
archiver  (database).  Keep  in  mind  that  in  a  real  system,  the 
archiver  will  interact  with  many  producers.  Each  producer- 
archiver  interaction  will  use  the  algorithms  presented  in  our 
paper.  The  archiver  may  assign  different  importance  to  dif¬ 
ferent  sensors.  The  problem  of  gathering  data  from  multiple 
sources  to  achieve  an  overall  level  of  quality  of  the  archive 


is  itself  very  interesting  [8,  23].  Rather  than  capturing  the 
time  series  by  probing  the  sensor,  we  will  do  so  by  receiv¬ 
ing  messages  from  it.  Wireless  devices  pay  a  heavy  price 
(energy-wise)  for  listening  on  the  radio  channel  even  if  no 
data  is  being  transmitted  [29] . 

2.1.  Definitions  and  Assumptions 

For  simplicity’s  sake,  we  will  assume  that  the  producer’s 
clock  is  synchronized  with  the  archiver’s.  Time  synchro¬ 
nization  is  an  important  issue  of  research  in  sensor  networks 
[11]  but  goes  beyond  the  scope  of  our  paper.  We  will  as¬ 
sume  that  time  is  discrete  and  will  denote  the  time  domain 
as  T  =  {1,  2,  ...}.  The  time  quantum,  corresponding  to 
one  step,  is  the  sampling  period  of  the  sensor.  We  will  also 
deal  with  time  series  whose  value  domain  is  K,  i.e.,  the  real 
numbers. 

We  define  a  time  series  as  a  sequence  S  = 
<s[l],  s[2], . . .)  where  s[k]  £  M  is  the  value  of  an  ob¬ 
served  real-world  process  at  time  position  k  €  T.  We 
note  the  time  position  of  now  as  n.  The  observed  series , 
at  time  n  is  noted  as  Sn  =  (s[l],  s[2], . . . ,  s[n]).  We  use 
s[t  :  j]  to  note  a  subseries  from  time  i  to  time  j,  i.e., 
s[i  :  j]  =  (  s[fi  s[i  +  1],  ,  s[j  -  1],  s[j]  ).  Hence, 

Sn  =  s[l  :  n]. 

The  sensor  has  a  finite  energy  supply.  This  is  depleted 
during  normal  sensing  operation  at  some  rate.  Additional 
energy  drain  is  caused  when  using  any  of  the  sensor’s  equip¬ 
ment,  including  (i)  powering  its  memory,  (ii)  using  its  CPU, 
(iii)  communicating  bits,  or  listening  on  the  wireless  chan¬ 
nel.  The  specific  rates  of  energy  consumption  for  these 
operations  are  sensor-specific.  Modern  sensors  try  to  be 
power-aware,  shutting  down  components  when  they  are  not 
used  or  e.g.,  changing  the  frequency/voltage  of  the  proces¬ 
sor  [15]  depending  on  the  workload. 

Different  sensors  are  bound  to  differ  in  the  details  of  their 
architecture  and  power  consumption.  We  simply  observe 
that  communication  is  often  the  major  cause  of  energy  drain 
[24, 16]  in  sensors  and  hence,  in  the  interest  of  extending  the 
sensor’s  life,  communication  must  be  curtailed. 

2.2.  Objectives 

We  identified  communication  as  the  main  target  for  op¬ 
timization.  Our  goals  are  the  following: 

•  To  capture  an  approximate  version  S  of  the  series  S 
in  the  archive.  S  is  conceptually  a  time  series  for  the 
same  time  positions  as  S  but  has  a  smaller  actual  rep¬ 
resentation.  We  present  algorithms  to  construct  such  a 
compressed  representation  in  Section  3. 

•  To  predict  values  of  S  ahead  of  time,  i.e.,  before  re¬ 
ceiving  them  at  the  archive.  This  can  be  achieved  by 
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fitting  predictive  models,  using  these  to  estimate  val¬ 
ues.  We  examine  the  problem  of  prediction  in  our  set¬ 
ting  in  Section  4. 

2.3.  Quality  Metric 

A  commonly  used  metric  for  comparing  time  series  is 
that  of  Euclidean  distance  [17,  12],  If  Sn  =  s[l  :  n]  and 
Sn  =  s[l  :  n]  then  this  is  defined  as: 

1  " 

1  fc=i 

If  we  were  to  specify  quality  as  an  upper  bound  on  this 
distance,  then  we  would  make  room  for  large  divergence 
of  individual  samples  of  the  time  series.  For  similarity- 
retrieval  types  of  applications  [26,  17],  the  main  goal  is  to 
identify  similarity  of  overall  structure,  rather  than  similarity 
of  individual  values.  We  do  not  want  to  assume  the  use  of 
the  time  series  data.  Hence,  we  will  use  a  stronger  notion  of 
quality,  namely  that  the  estimation  for  any  individual  sam¬ 
ple  s[k]  should  not  deviate  from  s[fc]  more  than  an  upper 
bound.  We  can  use  the  L  ^  metric: 

Loo(Sn,Sn)  =  max  |s[fc]  —  sTfcll 

l<fc<n 

and  state  the  quality  requirement  as  follows: 

I/00(S'",  Sn )  <  ecapt  44  max  |s[fc]  -  s[fc]|  <  ecapt 

1  <k<n 

Sn  is  said  to  be  a  within-ecopj  approximation  of  5"  if  the 
above  holds.  Note  that  this  is  a  “stronger”  notion  of  quality 
in  that  it  implies  a  bound  on  the  Euclidean  distance1: 

Loo{Sn,  Sn)  <  ecapt  =>  d(Sn,  Sn)  <  e2capt 

Our  first  objective  can  be  formalized  as  capturing  and 
storing  a  within-ecop*  version  of  S  at  the  archive,  where 
e-capt >  is  a  user-specified  capture ,  or  compression  tolerance. 

2.4.  Latency  and  the  need  for  prediction 

The  use  of  prediction  is  motivated  by  the  latency  be¬ 
tween  the  producer  and  the  archiver  of  the  time  series.  This 
can  be  broken  down  to: 

•  Communication  Latency,  ncomm  . —  This  includes  the 
transmission,  propagation  and  queuing  times  in  both 
the  wireless  and  wired  links  between  producer  and 
archiver. 

'Actually  d(Sn,Sn )  <  e  also  implies  a  bound  on  L00(Sn ,  Sn), 
namely  that  L00{Sn ,  Sn )  <  y/ne,  but  this  is  a  very  loose  bound  as  it 
is  proportional  to  y/n. 


•  Compression  Latency,  ncomp  . —  This  consists  of  the 
time  spent  at  the  sensor  processing  5"  so  as  to  produce 
Sn. 

As  a  result  of  the  overall  latency,  at  time  n,  the  archiver 
will  have  received  not  Sn,  but  rather  Sn~niag  where  niag  = 
n-comm  +  nComP  is  the  number  of  time  positions  it  is 
“behind”  the  producer.  Any  real-time  applications  (e.g., 
queries)  that  require  the  value  of  the  time  series  for  any  time 
position  nq,  in  the  future  (nq  >  n),  the  present  (nq  =  n ), 
or  the  recent  past  (n  —  niag  <  nq  <  n)  must  wait  for  that 
value  to  arrive  from  the  producer. 

Suppose  that  the  value  of  the  time  series  at  time  posi¬ 
tion  nq  >  n  —  niag  is  needed.  The  system  will  provide  an 
estimate  sq[nq ]  using  any  of  the  following  three  evaluation 
strategies: 

•  Predict. —  Some  predictive  model  M  and  its  parame¬ 
ters  6_  are  stored  in  the  archive.  Subsequently,  s?[n?]  is 
predicted  as  SM.o[nq].  We  will  note  this  as  sjn?]  when 
( M ,  9_)  is  known.  This  raises  the  issue:  how  does  one 
obtain  such  a  predictive  model,  and  how  can  one  be 
guaranteed  of  the  difference  between  the  predicted  and 
the  actual  value. 

•  Probe. —  The  producer  is  asked  directly  for  s[n?].  This 
requires  from  the  producer  to  maintain  all  samples  (or 
at  least  their  approximations)  it  has  not  forwarded  to 
the  archive.  Additionally,  the  producer  must  now  listen 
on  the  wireless  channel  for  probes.  This  is  a  cause  of 
energy  drain  [29].  The  producer  can  tune  in  to  listen 
for  probes  occasionally.  This  will,  however,  increase 
the  latency  seen  by  queries. 

•  Wait. —  The  final  strategy  is  to  simply  wait  for  s[n9]  to 
arrive  from  the  producer.  The  quality  of  sq[nq]  =  s[ng] 
is  guaranteed;  it  is  within  ecapt  of  s[n9]. 

We  note  here  that  prediction  is  very  attractive,  since  it 
does  away  with  the  latency  involved  with  either  doing  a 
probe  or  waiting  for  a  value  to  be  sent  by  the  sensor.  Our 
second  objective  can  be  formalized  as  providing,  to  any  in¬ 
terested  applications,  a  within -epred  estimation  of  time  se¬ 
ries  values,  before  these  values  arrive  at  the  archive.  In  Sec¬ 
tion  4  we  show  how  this  can  be  achieved.  We  will  also 
briefly  discuss  the  important  problem  of  choosing  a  predic¬ 
tive  model  among  many,  and  present  the  criterion  by  which 
different  candidate  models  can  be  compared. 

2.5.  Combining  Compression  with  Prediction 

Some  of  the  work  done  for  prediction  can  be  used  for  the 
capture  task  as  well.  If  the  predictive  model  is  somewhat 
accurate,  then  the  archive  already  has  an  idea  of  some  time 
series  values  before  receiving  them.  The  sensor  can  use  this 
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to  limit  the  effort  that  must  be  spent  to  compress  the  time 
series.  In  Section  4.4  we  will  show  how  this  basic  intuition 
can  be  used  algorithmically. 

3.  Compression  Algorithms 

Work  in  approximating  time  series  has  been  extensive  in 
the  literature.  Time  series  have  been  approximated  using 
wavelets  [4],  Fourier  transforms  [1],  piecewise  linear  ap¬ 
proximations  [18],  or  polynomials  [26].  Since  the  approx¬ 
imation  must  be  carried  out  by  the  sensor,  a  device  of  lim¬ 
ited  abilities,  the  employed  algorithm  must  be  lightweight 
in  terms  of  processing  and  memory  utilization. 

3.1.  The  Piecewise  Constant  Approximation 

An  attractive  type  of  lossy2  compression  is  the  piecewise 
constant  approximation^  (PCA)  [17],  whereby  the  time  se¬ 
ries  S  is  represented  as  a  sequence  of  K  segments: 

PCA(Sn)  =  {  (cls  ei),  (c2,  e2),  . . .  ,  (cK,  eK) ) 

where  ek  is  the  end-point  of  a  segment  and  c /.  is  a  constant 
value  for  times  in  [e*_i  +  1,  e*],  or  for  times  in  [1,  e±]  for 
the  first  segment.  In  such  an  approximate  representation, 
we  estimate  s[k]  as: 

Ci  if  k  <  e\ 

C-m  if  ttm  —  1  f  i  ^  ^  ^  P-m 

This  representation  is  intuitive  and  easy  to  index.  In 
terms  of  compression,  we  note  that  a  single  segment  costs 
us  bs  +  btp  to  store,  where  bs  is  the  size  of  a  sample  value 
and  btp  is  the  size  of  a  time  position  index.  If  a  time  series 
of  length  n  is  approximated  with  a  PCA  sequence  of  length 
K,  then  the  compression  ratio  is  K(b^+btA  .  jp  eacj1  segment 
corresponds  to  many  samples  of  the  time  series  is  sig¬ 
nificantly  less  than  1),  then  high  compression  ratios  can  be 
achieved. 

A  series  Sn  can  be  approximated  by  different  PC A{Sn) 
approximations.  As  we  will  see,  very  simple  on-line  algo¬ 
rithms  with  0(1)  space  requirement  can  be  used  to  con¬ 
struct  a  PCA  representation  that  preserves  the  desired  qual¬ 
ity  guarantee  with  minimum  K. 

3.2.  Poor  Man’s  Compression 

Poor  Man’s  Compression  (PMC)  is  a  bare-bones  form  of 
compression  that  can  be  used  to  reduce  the  size  of  a  time- 
series  representation.  It  is  an  on-line  algorithm,  producing 

Experimentation  with  lossless  methods  (gzip,  not  reported)  indicate 
very  small  (~50%)  compression  ratios.  Lossless  compression  also  does 
not  exploit  the  precision-compression  tradeoff. 

"'Til is  was  called  Adaptive  Piecewise  Constant  Approximation  (APCA) 
in  [17]  to  distinguish  it  from  a  similar  approximation  (PAA)  with  equal 
segment  lengths. 


procedure  PMC-MR 

Input: 

time  series  S  =  (s[l],  s[2],  . . .  ),  tolerance  €capt  >  0. 
Output: 

compressed  time  series  PCA(S)  within-ecopt  of  S. 

(1)  PCA(S)t-  {); 

(2)  n«-l; 

(3)  m  ■*—  s[n]; 

(4)  M  •<—  s[n]; 

(5)  while  S.hasMoreSamplesQ 

(6)  ifmax{M,  s[n]}  —  min{m,  s[n]}  >  2 ecapt 

(7)  append  ( .  n  1)  to  PCA(S); 

(B)  m  «—  s[n]; 

(9)  M  •$—  s[n]; 

(10)  else 

(11)  m  < —  min{m,  s[ro]}; 

(12)  M  «—  max{M,  s[n]}; 

(13)  end; 

(11)  n<— n+1; 

(12)  end; 

(13)  append  (44±™,  n  1)  to  PCA(S); 

Figure  1.  PMC-MR  Algorithm 

segments  of  the  PCA  representation  as  new  samples  arrive. 
It  requires  only  0(1)  space  and  performs  0(1)  computation 
per  sample.  Hence,  its  overall  time  complexity  for  a  series 
of  size  n  is  0(n).  This  computation  is  interspersed  with  the 
arrival  of  samples;  the  compressed  series  is  “ready  to  go”  as 
soon  as  the  last  sample  is  processed.  Hence  the  nCOmm  time 
of  Section  2  is  minimized. 

Let  .s [i  :  j]  be  some  time  series.  Can  this  be  compressed 
in  a  single  segment  in  a  manner  that  preserves  the  ecapt 
guarantee?  Lemma  1  supplies  the  necessary  and  sufficient 
condition. 

Lemma  1  The  time  series  s[t  :  j]  can  be  compressed  in  a 
single  segment  (c,  j)  with  an  error  tolerance  ecapt  iff: 

range[i  :  j]  =  max  s[k]  —  min  s [fc]  <  2ecapt 
i<k<j 


Proof:  If  for  all  k  :  i  <  k  <  j  :  \c  —  s[fc]|  <  ecapt  then 
also  \c-maxi<k<j  s[fc]|  <  ecapt  and  \c-mmi<k<j  s[fc]|  < 
tcapt.  Hence,  |  ma Xi<k<j  s[A]  -  min t<k<j  s[*]|  <  2 ecapt. 
or  range[i  :  j]  <  2 ecapt.  We  used  the  proposition  |a|  < 
6  A  |c|  <  d  =>•  \a  —  c\  <  b  +  d.  Conversely,  if  range[i  :  j]  < 
2 tcapt  then  the  segment  (maXi-fe-J  sM+mI1u<fc<j  SM  ;  j)  can 
be  trivially  shown  to  compress  it  within-ecopt. 

3.2.1.  Poor  Man’s  Compression  -  Midrange.  Our  first  al¬ 
gorithm  (see  Figure  1),  PMC-MR  (for  midrange)  uses  the 
converse  of  Lemma  1.  It  monitors  the  range  of  its  input. 
While  this  is  less  or  equal  to  2 ecapt  it  updates  the  range 
if  needed  (lines  11-12).  When  the  range  exceeds  2 ecapt  at 
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time  n,  then  the  segment  ending  at  n  —  1  with  a  constant  be¬ 
ing  equal  to  the  midrange  of  the  preceding  points  is  output 
(line  7).  The  algorithm  then  tries  to  compress  the  next  set 
of  samples,  starting  at  time  n  (lines  8-9). 

PMC-MR  not  only  achieves  the  goal  of  compression,  but 
satisfies  an  even  stronger  property:  that  no  other  PCA  repre¬ 
sentation  satisfying  the  tcapt  constraint,  over  any  input  time 
series  can  be  a  valid  compression  for  that  time  series  if  it 
has  fewer  segments.  PMC-MR  is  thus  instance  optimal  not 
only  for  the  class  of  on-line  algorithms,  but  over  any  algo¬ 
rithm  that  solves  this  problem  correctly.  We  state  our  claim 
and  its  proof  formally. 

Theorem  1  Let  Sn  =  s[l  :  n\  be  an  aribitrary  time  se¬ 
ries  that  must  be  approximated  with  a  piecewise  constant 
approximation  that  satisfies  for  all  k  =  1,2, ...  ,n  that 
|s[fc]  —  s[k]\  <  tcapt-  If  the  PMC-MR  algorithm  (Figure 
1)  produces  a  PCA(Sn)  representation  with  K  segments, 
then  no  valid  PCA  representation  with  K'  <  K  segments 
exists. 

Proof:  By  contradiction.  Let  BETTER(Sn)  be  a  valid 
representation  of  Sn  with  K'  <  K  segments.  Hence, 
K  =  K1  +  m  where  rri  >  1.  Therefore  of  the  K  inter¬ 
vals  of  PCA(Sn )  at  least  one  does  not  contain  an  endpoint 
of  BETTER(Sn).  This  cannot  be  the  final  one,  since 
that  must  be  n,  which  is  contained  in  the  final  interval  of 
PCA(Sn).  Let  [ej_i  +  1,  ef\  be  the  interval  of  PCA(Sn ) 
that  does  not  contain  an  endpoint  of  BETTER(Sn).  Let 
[d- 1  +  1  —  v,  ei  +  w]  with  v  >  l,w  >  1  be  the  in¬ 
terval  of  BETTER(Sn)  that  covers  [ej_i  +  1,  ef\  Thus, 
[d-i+l—v,  ei+w]  covers  [ej-i  +  l,  ej  +  1]  as  well  (since 
w  >  1).  But  since  [e,_i  +1,  e«]  is  an  interval  of  PCA(Sn), 
produced  by  the  PMC-MR  algorithm,  and  it  is  not  the  fi¬ 
nal  one,  then  the  range  of  values  in  [e*_  i  +  1,  e*  +  1]  is 
greater  than  2 tcapt-  The  range  of  values  in  any  time  interval 
is  always  greater  than  the  range  of  values  in  any  of  its  sub¬ 
intervals.  Hence,  the  range  of  values  in  [ei-i+l—v,  e*  +u>] 
is  greater  than  2 tcapt .  Therefore  there  doesn’t  exist  a  value  c 
such  that  for  all  values  s[fc]  where  k  €  [e*_  \+l—v,  ei+w\ 
it  is  \c— s[fc]|  <  tcapt  (Lemma  1).  Therefore,  the  segment  of 
BETTER(Sn)  whose  endpoint  is  at  time  position  e*  +  w 
violates  the  tcapt  constraint  and  BETTER(Sn)  is  not  a 
valid  representation  for  5" .  $ 

PMC-MR  does  an  optimal  job  at  compression,  but  it  has 
two  disadvantages.  First,  the  time  series  it  generates  can¬ 
not  be  easily  incorporated  in  similarity-retrieval  systems, 
which  usually  rely  on  PCA  representations  where  the  con¬ 
stant  value  for  each  segment  is  the  mean  of  the  time  series 
values  for  that  segment  [17].  Second,  the  mean  error  pro¬ 
duced  by  PMC-MR  may  sometimes  be  large,  even  close  to 
tcapt,  especially  if  the  distribution  of  values  is  skewed.  This 
problem  does  not  conflict  with  our  specification  of  quality, 
but  it  is  an  undesirable  property  of  the  algorithm. 


Consider  the  time  series  S  =  (0,  0,  0,  4)  and 
suppose  that  tcapt  =  2.  PMC-MR  will  approximate 
it  with  one  segment  (2,  4).  The  mean  error  will  be 

!0-2|  +  j0-2|  +  j0-2|  +  |4-2!  _  0 
i  “  Z' 

3.2.2.  Poor  Man’s  Compression  -  Mean.  To  ad¬ 
dress  these  problems,  we  propose  a  modified  algorithm, 
called  Poor  Man’s  Compression-Mean  (PMC-MEAN). 
PMC-MEAN  is  identical  to  PMC-MR  except  that  it  uses 
the  mean  of  the  points  in  each  segment  as  the  constant 
of  the  segment.  Values  are  sampled  until  the  mean  of 
the  points  seen  thus  far  is  more  than  tcapt  away  from  the 
minimum  or  maximum  of  the  observed  points.  Then,  a 
segment  is  output  and  the  algorithm  tries  to  compress  the 
next  set  of  points  starting  from  the  one  that  caused  the 
tolerance  violation. 

As  an  example,  for  the  series  S  above,  PMC-MEAN 
would  output  two  segments  (0,  3)  and  (4,  4).  Its  error 
would  be  zero  for  these  segments,  but  it  will  have  produced 
more  segments  than  the  optimal  algorithm  (PMC-MR). 
Choosing  between  these  two  algorithms  must  depend 
on  the  use  of  the  data  and  their  relative  performance  at 
compression.  In  our  experiments  of  Section  5  we  will  see 
that  over  many  datasets,  PMC-MEAN  performed  only  little 
worse  than  PMC-MR.  Hence,  we  consider  it  as  a  viable 
alternative  to  PMC-MR. 

3.2.3.  PCA  Segment  Transmission.  We  observe  that 
the  PMC  algorithms  produce  a  sequence  of  compressed 
segments.  These  can  be  forwarded  either  immediately,  or 
aggregated  into  packets  for  network  transmission  when 
either  the  sensor’s  memory  buffer  is  filled,  or  the  maximum 
packet  length  is  reached.  Normally,  we  would  like  to  fit 
as  many  segments  into  a  packet  as  possible,  since  each 
packet  has  some  overhead  in  terms  of  header  information. 
However,  since  packets  may  be  lost,  especially  over  the 
unreliable  links  available  to  sensors,  it  might  make  sense 
to  limit  the  maximum  packet  length  for  transmission,  thus 
avoiding  the  loss  of  large  segments  of  the  time  series  all  at 
once. 

Note,  that  there  is  no  guarantee  for  the  time  it  takes  for 
a  segment  to  be  output.  If  compression  is  successful,  then 
potentially,  a  single  segment  could  go  on  forever  —  if  all 
new  points  do  not  cause  the  violation  of  the  tcapt.  condition. 
In  practice,  we  might  interrupt  these  algorithms  if  we  want 
to  receive  segments  of  the  time  series  in  a  timely  manner. 

4.  Prediction 

In  Section  2,  we  motivated  the  use  of  prediction  from 
the  need  of  real-time  applications  to  co-exist  with  the  cap¬ 
ture  task.  We  will  now  address  some  issues  arising  when 
prediction  is  performed. 
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4.1.  Who  should  predict? 

There  are  two  fundamental  ways  in  which  prediction  can 
be  used: 

•  Archive-side. —  The  data  archive  contains  at  least  s[l  : 
n—riiag\.  This  can  be  used,  via  some  prediction  model, 
to  provide  an  estimate  s[fc],  for  time  positions  k  >  n  — 

IT-lag- 

•  Producer-side. —  The  producer  sees  the  entire  s[l  :  n\. 
Hence,  it  can  also  use  some  prediction  model  to  pro¬ 
vide  an  estimate  s]fc].  In  this  case,  the  parameters  of 
this  model  need  to  be  transmitted  to  the  archive. 

Archive-side  prediction  has  the  obvious  advantage  of  not 
requiring  communication  with  the  sensor4.  A  second  ad¬ 
vantage  is  that  the  archive  sees  the  “broad  picture”  of  the 
sensor’s  history.  It  can  thus  infer  predictive  models  at  a 
larger  time  scale,  accounting  perhaps  for  cyclic  behavior, 
global  trends  or  other  aspects  not  discernible  from  the  sen¬ 
sor’s  limited  (time-wise)  perspective.  Its  disadvantages  are: 
(i)  it  is  based  on  S  and  not  on  the  precise  5,  (ii)  it  can¬ 
not  provide  any  prediction  quality  guarantee,  as  the  archive 
does  not  monitor  the  precise  S  which  can  deviate  from  the 
predicted  5  without  bound,  and  (iii)  prediction  must  be  ac¬ 
curate  niag  steps  into  the  future  for  it  to  predict  the  present 
value  accurately.  As  we  mentioned  in  Section  3.2.3,  niag 
may  be  very  large. 

Producer-side  prediction  has  the  disadvantage  of  requir¬ 
ing  communication.  Since  producers  have  limited  memory, 
only  the  most  recent  past  of  the  time  series  can  be  stored  in 
it,  or  perhaps  very  coarse  derivative  information  about  the 
more  distant  past.  Hence,  long-term  effects  like  cycles  can¬ 
not  be  incorporated  into  the  prediction  model.  However, 
the  main  advantage  of  producer-side  prediction  is  that  it 
uses  the  raw  S  series,  and  allows  for  prediction  guarantees. 
Producer-side  prediction  will  be  used  in  the  following. 

4.2.  Producer-Side  Prediction 

The  basic  form  of  Producer-Side  Prediction  (PSP)  is 
shown  in  Figure  2.  The  input  of  the  algorithm  is  a  time 
series,  a  prediction  tolerance  fpre(i  and  a  parametric  predic¬ 
tive  model  M.  PSP  begins  by  guessing  a  set  of  parameters 
for  M  (line  1).  Subsequently,  each  sample  s[fc]  is  checked 
against  the  predicted  value  based  on  the  last  set  of  parame¬ 
ters  9_last  (line  7).  If  this  is  greater  than  epred ,  then  a  new  set 
of  parameters  is  computed  (line  8),  by  updating  the  old  pa¬ 
rameters  with  the  samples  at  time  positions  greater  than  the 
time  when  the  last  prediction  parameters  were  estimated. 
The  algorithm  produces  a  sequence  of  ( 8 ,  n)  pairs.  These 

4In  principle,  the  archive  could  also  send  prediction  parameters  to  the 
sensor,  especially  if  prediction  guarantees  are  required. 


procedure  PSP 

Input: 

time  series  S  =  {  s[l],  s[2],  . . .  ), 
prediction  tolerance  epre,i  >  0,  model  M. 

Output: 

prediction  sequence  PS  =  {(6 1:  t±),  (02,  h),  •••)■ 

(1)  guess  9  for  M; 

(2)  ps<-<(0,  o)); 

(3)  9last  «-  0; 

(4)  nlast  «-  0; 

(5)  n<-  1; 

(6)  while  S.hasMoreSamples() 

(7)  if  \sM:Siast  [n\  -  s [n\ \  >  epred 

(8)  9  <r-  updateParameters(M,  s[nlast  +  1  :  n]) 

(9)  append  (9,  n )  to  PS; 

(10)  9last  «-  0: 

(11)  nlast<-n; 

(12)  end; 

(13)  n  «—  n  +  1; 

(14)  end; 

Figure  2.  Producer-Side  Prediction  with  Error 
Guarantee 


predict  the  time  series  starting  from  time  n.  This  sequence 
defines  a  within- epr.e(j  approximate  version  of  5,  which  we 
may  note  as  S. 

We  observe  that  prediction  parameters  do  not  arrive 
instantaneously  to  the  archive.  Let  r  be  an  upper  bound  on 
this  time.  Clearly,  if  the  time  were  unbounded,  then  PSP 
would  provide  no  guarantee,  as  queries  can  never  be  certain 
whether  a  parameter  refresh  is  on  its  way  or  not. 

4.2.1.  Setting  epred .  The  best  value  for  epred  de¬ 
pends  on  the  quality  requirements  of  real-time  applications. 
If  values  must  be  predicted  frequently  at  a  high  quality, 
then  epred  must  be  set  low.  This  problem  was  studied  in 
detail  in  [22].  In  that  paper,  the  (implicit)  prediction  model 
was  s[fc]  =  s[nZas*].  We  can  adopt  a  similar  algorithm  to 
set  tpred  adaptively.  The  main  intuition  in  [22]  is  that  as 
data  becomes  more  variable,  epred  is  increased,  to  reduce 
the  number  of  messages.  On  the  other  hand,  when  queries 
arrive  at  a  high  rate  with  small  error  tolerances,  epred  is 
decreased  to  make  sure  that  these  queries  can  be  answered 
at  the  server  without  performing  any  probes.  Setting  epred 
adaptively  does  not  conflict  with  the  algorithms  presented 
in  this  paper. 

4.2.2.  Choosing  a  Prediction  Model.  Our  use  of 

prediction  does  not  assume  a  particular  predictive  model. 
The  actual  model  to  be  used,  must  be  chosen  based  on 

(i)  domain  knowledge  about  the  monitored  attribute,  and 

(ii)  engineering  concerns,  i.e.,  the  cost  associated  with 
fitting  prediction  models  and  (especially)  transmitting 
their  parameters.  Traditionally,  prediction  performance  is 
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gauged  by  prediction  error.  Suppose  that  (Mi,  )  and 
(m2,  e_2)  are  competing  models  with  their  parameters.  If 
at  time  n,  it  is  the  case  that: 

|s[n]  -  sMl,e_M\  >  |s[n]  -  sM2,e2[n]\ 
then  (Mi,  0X)  is  a  “worse”  predictor  than  (M2,  #2)  at  time 
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From  a  system  performance  perspective,  as  long  as 
and  (M2,  #2)  do  not  produce  errors  greater  than 
ePred,  (resulting  in  new  transmission  of  new  parameters), 
they  are  equivalent.  Consider  competing  models  M\ ,  M2 
and  let  |.  \02  |  denote  the  size  (in  bytes)  of  their  parame¬ 
ters.  If  K 1  messages  are  generated  by  Mi  and  K->  by  M-> , 
then  Mi  is  preferred  if  JiTi  j^!  |  <  K2\02\,  since  this  leads 
to  reduced  data  transmission.5  If  the  model  must  be  fixed 
a  priori,  then  a  decision  must  be  made  based  on  the  above 
criterion,  using  experimentation,  expert  opinion  or  past 
experience  to  choose  between  competing  models. 


*  Queried  Time  Parameter  I 

|  Refresh  I 

'  q  Time  v 

Figure  3.  Estimating  Time  Series  Values 

tion  code  to  the  sensor,  either  by  expert  intervention, 
or  automatically. 

In  Section  5  we  will  perform  a  simple  experiment  validating 
the  need  for  adaptive  model  selection. 


4.2.3.  Adaptive  Model  Selection.  In  many  situa¬ 
tions,  a  global  model  for  predicting  a  time  series  is  not  a 
valid  assumption.  It  is  likely  that  a  time  series  can  best  be 
approximated  by  different  models  for  different  epochs  of 
its  history.  We  informally  define  an  epoch  as  a  time  period 
during  which  a  time  series’  behavior  is  well-predicted  by  a 
single  model. 

Consider  for  example  a  moving  object  in  one  dimension. 
The  object’s  position  at  different  times  is  the  time  series  of 
interest.  At  times,  the  object  may  be  idle.  The  best  model 
is  then  sjn]  =  c.  At  other  times,  it  is  moving  at  a  constant 
speed;  a  good  model  is  then  s[n]  =  v  ■  (n  —  nlast )  +  s [nlast ] . 
Sometimes  it  is  accelerating,  or  decelerating,  etc.  All  these 
times  are  epochs  of  the  object’s  history. 

The  problem  of  detecting  changes  in  sequences  is  com¬ 
plex  [27,  13],  A  general  solution,  applicable  to  different 
types  of  time  series  and  different  classes  of  models  cannot 
be  easily  found.  The  two  main  problems  in  epoch  detec¬ 
tion  is  to  (i)  discover  the  epoch  boundaries,  and  (ii)  not  be 
fooled  by  temporary  abnormal  behavior  into  thinking  that  a 
new  epoch  has  commenced.  These  matters  are  part  of  our 
current  research.  Briefly,  we  anticipate  two  modes  of  adap¬ 
tivity: 

•  Producer-side  model  selection ,  in  which  the  sensor 
chooses  from  competing  models  in  reaction  to  the 
changing  behavior  of  the  time  series. 

•  Archiver-side  model  selection,  in  which  the  archive 
monitors  system  performance  and  “uploads”  predic- 

5Depending  on  the  network  protocols  in  place,  the  difference  of  \9_^  |  vs. 
\92 1  may  not  be  critical.  Costs  associated  with  the  protocols  (e.g.,  headers) 
may  dominate  the  cost  of  transmitting  either  9  j  or  92 ,  if  e.g.,  the  difference 
between  j 9^  ]  and  \92\  is  only  a  few  bytes.  In  such  a  case,  the  simple  test 
K 1  <  K2  would  be  preferred. 


4.3.  Estimating  time  series  values  ahead  of  time 

An  application  q  arrives  at  time  n,  asking  for  some  esti¬ 
mate  sq[nq]  of  s[nq ]  with  bounded  error:  |s[n9]  —  s9[n9]|  < 
e,j.  The  discussion  that  follows  will  refer  to  Figure  3. 

•  Case  I:  nq  <  n  —  niag ■  The  estimate  is  based  on  the 
captured  series:  s?[n?]  =  s[n9].  If  eq  <  ecapt ,  then 
the  quality  of  the  captured  series  does  not  suffice  to 
provide  an  estimate  with  error  bounded  by  eq.  In  such  a 
case,  the  system  can  only  provide  an  estimate  sq[nq]  = 
sjn9]  which  may  violate  the  eq  tolerance. 

Let  (6_last ,  nlast)  be  the  most  recent  parameters  re¬ 
ceived  at  the  archive. 

•  Case  IP.  nlast  >  nq.  We  look  at  the  prediction  se¬ 
quence  for  the  parameters  (#,  nr)  where  nr  is  the 
most  recent  time  position  (with  respect  to  nq )  in  which 
parameters  were  refreshed.  The  answer  is  sq[nq]  = 

if  Cpred  ^  €q-  If  €q  tpred*  but  6q  ^  C-capt* 

then  the  application  can  wait  for  sjn9]  to  arrive.  As 
we  have  mentioned  in  Section  3.2.3,  this  time  may  be 
unbounded.  We  can  force  compressed  segments  to  be 
sent  in  a  timely  manner.  Finally,  if  f  q  <  ecapt,  then 
it  is  necessary  to  do  a  probe  which  returns  the  exact6 

*  [««]■ 

•  Case  IIP  nlast  <  nq  and  n  —  nq  >  r.  The  answer  is 
sq[nq ]  =  sM  eia*t[nq].  If  at  time  nq ,  the  time  series 
had  violated  the  epred  tolerance,  then  we  would  have 
received  new  parameters  by  “now”  (n).  As  we  have 
not  received  new  parameters  (since  nlast  <  nq  <  n) 

6Samples  need  to  be  kept  in  the  sensor’s  buffer  if  applications  are  al¬ 
lowed  to  probe  the  sensor  for  exact  values. 
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then,  we  are  guaranteed  that  |s[n?]  —  sq[nq] \  <  epred- 
Again,  if  eq  <  epred ,  a  probe  has  to  be  issued  for  the 
exact  s[ng]. 

•  Case  IV:  nlast  <  nq  and  n  —  nq  <  t.  Potentially, 
new  parameters  have  been  estimated  for  time  nq.  The 
last  update  is  not  guaranteed  to  be  valid  until  time  po¬ 
sition  nq  +  t.  Thus,  we  wait  until  that  time.  While  we 
wait,  it  may  be  that  nlast  changes,  since  parameters 
estimated  at  times  both  before  nq  and  after  nq  may  ar¬ 
rive.  As  long  as  they  are  before  nq,  we  still  have  to 
wait.  Otherwise  (new  nlast  of  Figure  3),  it  becomes 
that  nlast  >  nq ,  in  which  case  we  estimate  the  value 
as  described  (Case  II).  Once  again,  we  can  choose  to 
wait  for  sjn?]  or  probe  for  s[n9]  depending  on  the  eq. 

4.4.  Combining  Prediction  and  Compression 

In  our  discussion  so  far,  we  have  treated  compression 
and  prediction  separately.  However,  both  of  them  estimate 
values  of  the  same  time  series,  from  its  past  and  its  future 
respectively.  Can  we  further  reduce  the  communication  cost 
by  combining  the  two? 

Observe,  that  prediction  in  itself  can  be  viewed  as  a  form 
of  compression.  Each  set  of  parameters  is  a  within-epr.ed 
approximation  of  the  time  series  for  all  times  until  the  next 
parameter  refresh.  If  we  wanted  to  capture  the  time  series 
within  ecapt,  we  could  just  as  well  have  set  epred  =  eca pt- 
This  would  have  sufficed  to  capture  the  time  series  in  the 
archive. 

How  good  would  is  such  a  strategy?  If  the  time  series 
is  very  predictable,  it  may  work  better  than  any  form  of 
compression  that  doesn’t  assume  a  model  for  the  time  se¬ 
ries.  Consider,  e.g.,  the  time  series  s[n]  =  n.  By  fitting  the 
model,  sjn]  =  n,  we  never  need  to  re-send  any  parameters 
at  all.  On  the  other  hand,  suppose  that  the  model  approxi¬ 
mates  the  time  series  behavior  poorly.  Then,  parameter  re¬ 
freshes  would  be  sent  frequently.  Compression  would  work 
much  better  in  this  case. 

Clearly,  if  epred  <  tcapt.  no  compression  is  needed.  If 
tpred  >  tcapt  then  sjn]  may  deviate  from  s[n]  by  more  than 
tcapt-  We  can  still  make  use  of  sjn]  by  observing  the  fol¬ 
lowing  result. 

Theorem  2  Let  A”  =  $[1  :  n]  =  (  s[l]  —  sjl],  s[2]  — 
sjl],  ...,  s[n]  —  sjn]  ).  If  A"  =  d[l  :  n]  is  a  within- 
ecapt  approximation  of  A",  then  the  series  Sn  =  {  sjl]  + 
<5[1],  s[2]  +  <5[2],  . . . ,  sjn]  +  <S[n]  )  is  a  within-ecapt  ap¬ 
proximation  of  Sn. 

Proof:  By  contradiction.  Let  S"  not  be  a  within-ecopj  ap¬ 
proximation  of  Sn.  Then,  there  exists  some  k  such  that 
|s[fc]  sjA?]  |  tcapt  |^[&]  ^j^t]  ^[^]|  ^  ^capt 


|5[fc]  —  d[fc]|  >  ecapt-  This  contradicts  our  hypothesis,  be¬ 
cause  A”  is  a  within-ecapt  approximation  of  A".  j| 

Theorem  2  presents  an  alternative  strategy  for  compress¬ 
ing  5,  if  prediction  is  performed  in  the  system.  The  sensor 
can  monitor  the  time  series  A  of  the  prediction  errors  and 
compress  it  within  ecapt-  Subsequently,  the  compressed  A 
can  be  sent  to  the  archive  which  can  then  obtain  a  within- 
eCapt  version  of  S  by  adding  the  predicted  series  S  to  the 
compressed  error  A. 

When  is  compressing  A  preferrable  to  compressing  S'? 
This  depends  on  epred  and  the  quality  of  the  predictive 
model.  When  epred  is  close  to  ecapt  then  the  error  series  A, 
taking  values  in  the  interval  [— epred,  epred ]  will  probably  vi¬ 
olate  the  ecapt  tolerance  infrequently.  Hence,  compressing 
A  may  be  better  than  compressing  S.  Conversely,  as  epred 
increases,  the  error  is  allowed  to  fluctuate  more;  so,  perhaps 
compressing  S  is  preferrable.  The  quality  of  the  predictive 
model  is  also  a  major  factor.  If  A  has  a  small  range  (irre¬ 
spective  of  epred )  then  it  may  be  more  compressible  than 
S. 

In  Figure  4  we  show  (first  four  plots),  a  time  series  S.  its 
within-ecap£  =  2  compression  S,  its  within-epred  =  2.5 
prediction  S2.5  and  its  within-spre(j  =  5  prediction  S5. 
Subsequently,  we  show  (next  four  plots)the  prediction  er¬ 
ror  A2.5,  its  compression  A2.5,  the  prediction  error  A5,  its 
compression  A5.  The  compression  of  S  has  14  segments, 
while  the  compression  of  A2.5  has  only  10  segments,  since 
the  prediction  tolerance  epred  =  2.5  is  close  to  ecapt  =  2. 
For  this  time  series,  the  compression  of  A5  has  16  segments 
and  is  thus  worse  than  the  compression  of  S. 

In  Section  5  we  will  see  situations  where  either  com¬ 
pressing  5,  or  A  is  better.  The  sensor  can  compress  the 
series  in  both  ways.  When  a  message  is  to  be  transmitted, 
it  can  choose  to  forward  either  the  compressed  series  or  the 
compressed  error,  depending  on  which  is  smaller.  This  has 
a  small  overhead  for  adding  a  marker  to  identify  the  used 
strategy  and  a  2-fold  increase  in  processing  and  memory  us¬ 
age  at  the  sensor.  This  is  reasonable,  since  it  reduces  com¬ 
munication. 

5.  Performance  Study 

In  this  section,  we  perform  an  evaluation  of  our  ideas  in 
this  paper.  The  results  confirm  the  good  performance  of  our 
algorithms  under  different  situations. 

5.1.  Compression  Experiments 

First,  we  examine  the  effectiveness  of  PMC-MR  and 
PMC-MEAN  for  synthetic  and  real-world  data.  We  use  syn¬ 
thetic  Random  Walk  data  generated  as: 

x[l]  =  0  and  x[n ]  =  x[n  —  1]  +  sn  where  sn  ~  U(— 1, 1) 
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Figure  4.  Combinng  Prediction  and  Compres¬ 
sion 


We  also  used  time  series  of  environmental  variables  from  an 
oceanographic  buoy  sampled  at  10  min  intervals  [21],  We 
used  a  Sea  Surface  Temperature,  Salinity,  and  Shortwave 
Radiation  series.  Statistics  about  all  used  series  are  given  in 
Table  1 .  We  preprocessed  the  buoy  series  to  remove  missing 
values.  We  compress  these  time  series  at  various  ccapi.  We 
chose  tCapt  as  follows.  We  first  determined  the  range  of 
each  time  series  and  used  l/1000th  of  that  as  our  baseline 
e6ose-  We  compressed  the  time  series  ecapt  by  multiplying 
Chase  with  factors  of  \/T0.  We  thus  covered  compression 
tolerances  from  to  ^  of  the  time  series  value  range. 

In  Figure  5  we  show  the  ^  ratio  achieved  by  PMC-MR 
and  PMC-MEAN  over  these  time  series  for  varying  ecapt- 
As  expected,  this  ratio  drops  as  tcapt  increases.  The  perfor¬ 
mance  of  PMC-MEAN  is  very  slightly  worse  than  the  opti¬ 
mal  PMC-MR  algorithm.  For  the  central  ecapt  value  ( 1  %  of 
range),  the  —  ratio  was  on  average  8.3%  for  PMC-MR  and 
9.4%  for  PMC-MEAN. 

In  Figure  6  we  show  the  mean  absolute  error  over  all  time 
positions.  This  is  roughly  less  than  half  the  ecapt  maximum. 
PMC-MEAN  and  PMC-MR  were  comparable  over  the  un- 


Dataset 

n 

A 

a 

range[l  :  n] 

Random  Walk 

100,000 

42.50 

59.80 

[-53.55,  148.35] 

Sea  Surface  Temperature 

143,508 

28.62 

0.67 

[25.82,31.87] 

Salinity 

54,531 

34.75 

0.26 

[33.41,35.28] 

Shortwave  Radiation 

117,069 

269.41 

358.00 

[0,  1351.3] 

Table  1.  Statistics  for  Time  Series  used  in  our 
Compression  Experiments 
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Figure  5.  Compression  Performance  (K/n  ra¬ 
tio) 
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Figure  6.  Compression  Performance  (Mean 
Absolute  Error) 


biased  synthetic  data,  but  with  real  data,  PMC-MEAN  had 
a  slight  edge.  This  is  due  to  better  approximation  by  using 
the  mean,  and  to  the  greater  number  of  segments  output  by 
PMC-MEAN  for  the  same  ecapt- 


9 


Aggregate  Queries 


Selection  Queries 


Constant  Velocity  Time  Series 


Constant  Acceleration  Time  Series 


i  35 

30 
25 
20 
15 
10 
5 
0  t 


AVG  with  PMC-MR 
AVG  with  PMC-MEAN 
MIN  with  PMC-MR 
MIN  with  PMC-MEAN 


False  pos.  PMC-MR 
False  pos.  PMC-MEAN 
False  neg.  with  PMC-MR 
False  neg.  PMC-MEAN 


ecapt  tolerance  (%  Range) 


ecapt  tolerance  (%  Range) 


3.5 
3 

2.5 
2 

1.5 
1 

0.5 
0  t 


20-NN  Queries 

False  pos.  PMC-MR 
False  pos.  PMC-MEAN 
False  neg.  with  PMC-MR 
False  neg.  PMC-MEAN 


0.1 


ecapt  tolerance  (%  Range) 


Figure  7.  Answering  Queries  over  Com¬ 
pressed  Time  Series 


Next,  we  test  how  query  performance  is  impacted  by 
using  compressed  as  opposed  to  precise  time  series.  We 
generate  100  series,  each  with  1,000  time  positions  from 
the  random  walk  model,  choosing  the  x[l]  ~  U( 0, 10)  to 
simulate  the  difference  in  values  reported  by  different  sen¬ 
sors.  We  compressed  these  using  the  ei,ase  =  0.2019  and 
its  y/W  multiples  as  before,  and  asked:  (i)  100  queries, 
for  random  time  positions  of  the  form  “What  is  the  min¬ 
imum  and  average  sensor  reading?”  (Aggregate  Queries), 
(ii)  1,000  queries,  one  per  time  position  of  the  form  “Which 
sensors’  values  are  above  cl”  (Selection  Queries),  where 
c  is  uniformly  chosen  from  [0, 10]  for  each  query,  and  (iii) 
100  queries,  asking  for  the  20  nearest  neighbors,  in  terms  of 
Euclidean  distance,  of  all  100  time  series  (20-NN  Queries). 

The  results  are  shown  in  Figure  7.  We  measure,  for  (i) 
the  relative  error  defined  as  the  fraction  of  the  absolute  er¬ 
ror  over  the  exact  answer,  and  for  (ii)  and  (iii)  the  average 
number  of  false  positives  and  false  negatives,  i.e.,  number 
of  time  series  that  should  not  have  been  retrieved  and  num¬ 
ber  of  time  series  that  should  have  been  retrieved  but  were 
not.  For  aggregate  queries,  relative  error  is  large  only  for 
the  MIN  aggregate,  since  the  PCA  representation  consis¬ 
tently  overestimates  the  MIN1 .  For  the  selection  queries, 
the  number  of  both  false  positives  and  false  negatives  was 
small  compared  to  the  average  query  selectivity  of  50.83. 
The  results  are  equally  good  for  the  20-NN  queries.  In  fact 
PMC-MEAN  had  no  false  positives/negatives  in  this  case. 

5.2.  Prediction  Experiments 

In  our  first  experiment,  we  want  to  motivate  experimen¬ 
tally  the  need  for  appropriate  model  selection  as  hinted  in 

7 A  lower  bound  on  the  MIN  can  be  easily  found,  given  that  the  PCA 
representation  has  an  eCapt  guarantee 


Figure  8.  Model  Selection 


Shortwave  Radiation 


Figure  9.  Prediction  and  Combined  Predic¬ 
tion/Compression  Experiments 


Section  4.2.2.  We  consider  the  location  of  an  object  mov¬ 
ing  in  one  dimension.  This  can  be  captured  by  a  sensor, 
either  on  the  object  (e.g.,  GPS)  or  independent  of  it  (e.g., 
radar).  The  object  may  move  at  a  constant  speed  v  for  some 
length  of  time  or  accelerate/decelerate  We  generated  100 
time  series  of  length  500  for  each  type  of  motion,  choosing 
v  ~  (7(0, 50)  and  a  ~  (7(0, 10).  We  added  some  measure¬ 
ment  error  ~  (7 ( — 25,  25)  on  the  location  and  tried  to  pre¬ 
dict  the  location  as:  (i)  last  known  location,  (ii)  first-order 
model  (constant  speed),  (iii)  second-order  model  (constant 
acceleration).  We  fit  these  models  on  the  10  most  recent 
samples  at  prediction  time.  In  Figure  8  we  show  the  relative 
performance  (number  of  parameter  refreshes)  using  these 
three  models,  for  varying  fpre(i  ranging  from  10  to  160  me¬ 
ters.  Not  surprisingly,  the  best  predictive  model  in  each  case 
is  the  one  which  generates  the  behavior.  As  an  example,  for 
the  constant  velocity  series,  the  last-know-value  is  “too  sim¬ 
ple”  failing  to  capture  the  change  in  the  object’s  location, 
while  the  constant  acceleration  model  is  “too  complex”  and, 
despite  encompassing  the  constant  velocity  model  as  a  spe¬ 
cial  case,  fails  to  outperform  it.  Our  example  illustrates  the 
benefit  of  pushing  some  intelligent  behavior  to  the  sensor 
and  the  importance  of  choosing  a  model  carefully. 
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In  our  next  experiment,  we  used  a  simple  predictive 
model,  namely  predicting  future  values  of  the  series  as 
equal  to  its  value  at  prediction  time.  This  is  optimal  if  the 
series  is  undergoing  an  unbiased  random  walk,  since  the  ex¬ 
pected  value  of  the  series  at  every  future  time  is  equal  to  its 
value  at  prediction  time.  Using  this,  each  parameter  update 
consists  of  a  value  and  the  prediction  time.  It  thus  has  the 
same  size  as  a  segment  of  the  PCA  representation. 

We  use  the  same  time  series  as  before.  We  set  ecapt  = 
Ktefcase,  i.e.,  the  “middle”  value  of  our  compression  exper¬ 
iments.  We  simulate  for  epred  in  1-  to  5-fold  multiples  of 
tcapt-  To  conserve  space,  we  combine  a  number  of  curves 
in  the  graphs  of  Figure  9.  “Compression  Only”  is  the  num¬ 
ber  of  segments  for  PMC-MR  compression  of  5  at  ecapt 
tolerance  and  “Prediction  Only”  is  the  number  of  parameter 
refreshes  when  epred  =  ecapt,  i.e.,  when  prediction  alone 
is  used  to  capture  the  time  series.  As  we  expect,  compres¬ 
sion  works  much  better  because  it  compresses  values  al¬ 
ready  seen  optimally,  rather  than  predicting  future  (uncer¬ 
tain)  values.  “Prediction  and  Compression  of  S”  is  the  sum 
of  a  within-ecapt  compression  of  S  and  the  number  of  pa¬ 
rameter  refreshes  for  epred  ranging  from  one  to  five  times 
f-capt  •  For  “Prediction  and  Compression  of  Delta”  we  use  of 
the  result  of  Section  4.4  and  compress  the  error  series  rather 
than  S.  As  mentioned,  when  tpred  is  small,  compressing 
A  as  opposed  to  S  is  preferrable.  As  epre(i  increases,  the 
two  curves  approach  each  other.  In  the  first  two  time  series, 
compression  of  S  is  slightly  preferrable,  while  in  the  other 
two  the  situation  is  reversed. 

6.  Related  Work 

Olston  et  al.  [22]  studies  the  performance/accuracy 
tradeoff  with  approximately  replicated  data.  The  motiva¬ 
tion  is  in  reducing  communication,  quantified  as  the  number 
of  exchanged  messages  between  producer  and  the  receiver 
of  data.  Interval-based  approximations  are  stored  at  the  re¬ 
ceiver  end,  supplied  as  guarantees  by  the  producer.  Our 
work  differs  in  employing  a  general  model  of  approximate 
replication  which  considers  temporal  latency  and  combines 
compression  and  prediction.  [22]  proposes  an  algorithm  for 
adaptively  setting  the  interval  width;  this  can  also  be  used 
to  adaptively  set  epred-  The  adaptation  problem  was  also 
studied  by  Deolasee  et  al.  [10]  for  web  data. 

Sensor  databases  have  recently  been  the  center  of  much 
research  in  the  database  community  e.g.,  in  the  Cougar  [3] 
and  Telegraph  [20]  projects.  These  efforts  aim  to  create 
technology  that  will  enable  the  creation  of  databases  where 
sensors  can  be  accommodated,  taking  into  account  the  novel 
performance  and  semantic  issues  that  distinguish  sensors 
from  traditional  data  sources. 

Time  series  data  has  long  been  an  important  area  of  re¬ 
search.  Our  paper  is  not  focused  in  introducing  algorithms 


for  extracting  information  from  time  series  or  in  similarity 
retrieval  as  e.g.,  in  Keogh  et  al.  [17],  or  Agrawal  et  al.  [1]. 
Our  focus  is  in  capturing  sensor-generated  series;  applica¬ 
tions  similar  to  the  above  can  then  be  applied  to  such  series 
in  the  archive. 

Chen  et  al.  [7]  propose  compression  of  databases,  mo¬ 
tivated  by  the  storage  and  bandwidth  limitations  of  mobile 
devices.  Unlike  our  paper,  devices  are  the  destinations  of 
data.  In  Chen  et  al.  [6]  the  problem  of  database  com¬ 
pression  and  querying  over  compressed  databases  is  stud¬ 
ied.  The  authors  motivate  their  work  by  the  increase  in 
CPU  power,  making  it  attractive  to  spend  CPU  time  in  com¬ 
pressing/decompressing  data  rather  than  in  doing  disk  I/O 
for  them.  Our  motivation  is  similar,  making  using  sensors’ 
CPU  power  to  limit  communication  and  energy  drain. 

In  our  paper,  we  use  prediction  as  a  means  of  improving 
system  performance,  namely  saving  communication  and  en¬ 
ergy  drain.  This  is  different  from  the  common  use  of  pre¬ 
diction  in  which  only  the  predicted  values  themselves  are 
of  interested.  Gao  et  al.  [12]  also  proposed  to  use  predic¬ 
tion  of  time  series  values.  In  [12],  the  goal  is  to  enable 
similarity-based  pattern  queries  in  batch  mode  by  finding 
nearest  neighbors  of  an  incoming  time  series.  By  applying 
prediction  on  this  time  series,  the  system  can  generate  can¬ 
didate  nearest  neighbors  ahead  of  time.  When  the  actual 
values  of  the  incoming  series  arrive,  these  are  filtered  and 
the  actual  nearest  neighbors  are  returned. 

Chen  et  al.  [5]  propose  to  perform  on-line  regression 
analysis  over  time  series  data  streams.  We  also  propose  to 
fit  models  to  time  series,  but  our  motivation  is  to  improve 
system  performance,  rather  than  regression  analysis.  A  use¬ 
ful  extension  to  our  work  would  be  to  use  some  of  the  ideas 
in  [5]  to  address  correlations  between  multiple  time  series 
that  a  single  sensor  may  be  monitoring. 

Finally,  we  refer  to  work  in  moving  object  databases 
[28,  25,  19],  In  this  research  field,  we  find  the  idea  of  ap¬ 
proximating  the  time  series  of  an  object’s  location  without 
continuous  updates,  in  Wolfson  et  al.  [28],  of  predicting  an 
object’s  future  location  based  on  its  velocity  vector  in  Salte¬ 
rns  et  al.  [25],  and  of  using  the  predictability  of  motion  for 
improving  performance  in  Lazaridis  et  al.  [19], 

7.  Conclusions 

In  this  paper  we  motivate  the  importance  of  capturing 
time  series  generated  by  wireless  sensors.  To  achieve  this 
we  task  sensors  with  compressing  time  series  and  fitting 
predictive  models.  We  propose  an  optimal  online  algorithm 
for  creating  the  piecewise-constant  approximation  of  a  real¬ 
valued  time  series,  satisfying  a  bound  on  the  L  ^  distance 
and  show  how  prediction  and  compression  can  co-exist  in  a 
system  to  address  the  needs  of  both  the  time  series  capture 
task  and  real-time  applications. 
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In  the  future,  we  plan  to  (i)  evaluate  the  effectiveness  of 
our  techniques  in  a  real-world  setting,  especially  for  motion 
time  series,  (ii)  to  examine  how  to  evaluate  general  SQL 
queries  with  answer  quality  or  response  deadline  tolerances, 
(iii)  to  develop  adaptive  algorithms  for  predictive  model  se¬ 
lection,  and  (iv)  to  investigate  lateral  communication  be¬ 
tween  sensors,  exploiting  redundancy  of  information  across 
many  sensors  to  further  improve  performance  in  the  time 
series  capture  setting. 

Acknowledgements 

Our  work  was  supported  by  the  National  Science  Foun¬ 
dation  (Awards  IIS-9996140,  IIS-0086124,  CCR-0220069, 
IIS-0083489)  and  by  the  United  States  Air  Force  (Award 
F336 15-01 -C- 1902). 

References 

[1]  R.  Agrawal,  C.  Faloutsos,  and  A.  N.  Swami.  Efficient  Simi¬ 
larity  Search  In  Sequence  Databases.  In  Proceedings  of  the 
4th  International  Conference  of  Foundations  of  Data  Orga¬ 
nization  and  Algorithms  (FODO),  pages  69-84,  Chicago, 
Illinois,  1993.  Springer  Verlag. 

[2]  B.  Babcock,  S.  Babu,  M.  Datar,  R.  Motwani,  and  J.  Widom. 
Models  and  issues  in  data  stream  systems.  In  Symposium  on 
Principles  of  Database  Systems  (PODS),  2002. 

[3]  R  Bonnet,  J.  Gehrke,  and  P.  Seshadri.  Towards  sensor 
database  systems.  In  Mobile  Data  Management  (MDM), 
2001. 

[4]  K.  Chan  and  A.  W.-C.  Fu.  Efficient  time  series  matching  by 
wavelets.  In  ICDE  Conference,  pages  126-133,  1999. 

[5]  Y.  Chen,  G.  Dong,  J.  Han,  B.  W.  Wah,  and  J.  Wang.  Multi¬ 
dimensional  regression  analysis  of  time-series  data  streams. 
In  VLDB  Conference,  2002. 

[6]  Z.  Chen,  J.  Gehrke,  and  F.  Korn.  Query  optimization  in  com¬ 
pressed  database  systems.  In  ACM  SIGMOD  Conference, 
2001. 

[7]  Z.  Chen  and  P.  Seshadri.  An  algebraic  compression  frame¬ 
work  for  query  results.  In  International  Conference  on  Data 
Engineering  (ICDE),  2000. 

[8]  J.  Cho  and  H.  Garcia-Molina.  Synchronizing  a  database  to 
improve  freshness.  In  ACM  SIGMOD  Conference,  2000. 

[9]  W.  S.  Conner,  L.  Krishnamurthy,  and  R.  Want.  Making  ev¬ 
eryday  life  easier  using  dense  sensor  networks.  In  Ubicomp, 
Lecture  Notes  in  Computer  Science.  Springer.  2001. 

[10]  P.  Deolasee,  A.  Katkar,  A.  Panchbudhe,  K.  Ramamritham, 
and  P.  Shenoy.  Adaptive  push-pull:  disseminating  dynamic 
web  data.  In  The  tenth  international  World  Wide  Web  con¬ 
ference  on  World  Wide  Web,  pages  265-274.  ACM  Press, 
2001. 

[11]  J.  Elson  and  D.  Estrin.  Time  synchronization  for  wireless 
sensor  networks.  In  2001  International  Parallel  and  Dis¬ 
tributed  Processing  Symposium  (IP DPS),  2001. 

[12]  L.  Gao  and  X.  S.  Wang.  Continually  evaluating  similarity- 
based  pattern  queries  on  a  streaming  time  series.  In  ACM 
SIGMOD  Conference,  2002. 


[13]  W.  Gilchrist.  Statistical  Forecasting.  John  Wiley  &  Sons, 
London,  1976. 

[14]  B.  Horling,  R.  Vincent,  R.  Mailler,  J.  Shen,  R.  Becker, 
K.  Rawlins,  and  V.  Lesser.  Distributed  sensor  network  for 
real  time  tracking.  In  Proceedings  of  the  fifth  international 
conference  on  Autonomous  agents,  pages  417^-24.  ACM 
Press,  2001. 

[15]  C.  Hughes,  J.  Srinivasan,  and  S.  Adve.  Saving  energy  with 
architectural  and  frequency  adaptations  for  multimedia  ap¬ 
plications.  In  Proceedings  of  the  34th  Annual  International 
Symposium  on  Microarchitecture  (MICRO-34),  Dec.  2001., 
2001. 

[16]  C.  Intanagonwiwat,  R.  Govindan,  and  D.  Estrin.  Directed 
diffusion:  a  scalable  and  robust  communication  paradigm 
for  sensor  networks.  In  Proceedings  of  the  sixth  annual  in¬ 
ternational  conference  on  Mobile  computing  and  network¬ 
ing,  pages  56-67.  ACM  Press,  2000. 

[17]  E.  J.  Keogh,  K.  Chakrabarti,  S.  Mehrotra,  and  M.  J.  Pazzani. 
Locally  adaptive  dimensionality  reduction  for  indexing  large 
time  series  databases.  In  ACM  SIGMOD  Conference,  2001. 

[18]  E.  J.  Keogh,  S.  Chu,  D.  Hart,  and  M.  J.  Pazzani.  An  on¬ 
line  algorithm  for  segmenting  time  series.  In  International 
Conference  on  Data  Mining.  IEEE  Computer  Society,  2001. 

[19]  I.  Lazaridis,  K.  Porkaew,  and  S.  Mehrotra.  Dynamic  queries 
over  mobile  objects.  In  EDBT  Conference,  2002. 

[20]  S.  Madden  and  M.  J.  Franklin.  Fjording  the  stream:  An 
architecture  for  queries  over  streaming  sensor  data.  In  Inter¬ 
national  Conference  on  Data  Engineering  (ICDE),  2002. 

[21]  M.  J.  McPhaden.  Tropical  atmosphere  ocean 

project,  pacific  marine  environmental  laboratory. 
http://www.pmel.noaa.gov/tao/. 

[22]  C.  Olston,  B.  T.  Loo,  and  J.  Widom.  Adaptive  precision 
setting  for  cached  approximate  values.  In  ACM  SIGMOD 
Conference,  2001. 

[23]  C.  Olston  and  J.  Widom.  Best-effort  cache  synchroniza¬ 
tion  with  source  cooperation.  In  ACM  SIGMOD  Conference, 
2002. 

[24]  G.  J.  Pottie  and  W.  J.  Kaiser.  Wireless  integrated  network 
sensors.  Communications  of  the  ACM,  43(5):51-58,  2000. 

[25]  S.  Saltenis,  C.  S.  Jensen,  S.  T.  Leutenegger,  and  M.  A. 
Lopez.  Indexing  the  positions  of  continuously  moving  ob¬ 
jects.  In  SIGMOD  Conference,  2000. 

[26]  H.  Shatkay  and  S.  B.  Zdonik.  Approximate  queries  and  rep¬ 
resentations  for  large  data  sequences.  In  ICDE,  pages  536- 
545,  1996. 

[27]  L.  Telksnys,  editor.  Detection  of  changes  in  random  pro¬ 
cesses.  New  York,  Optimization  Software,  1986. 

[28]  O.  Wolfson,  S.  Chamberlain,  S.  Dao,  L.  Jiang,  and 
G.  Mendez.  Cost  and  imprecision  in  modeling  the  position 
of  moving  objects.  In  ICDE  Conference,  1998. 

[29]  X.  Yang  and  A.  Bouguettaya.  Broadcast-based  data  access 
in  wireless  environments.  In  International  Conference  on 
Extending  Database  Technology  (EDBT),  2002. 

[30]  S.  Zdonik,  U.  Cetintemel,  M.  Cherniack,  C.  Convey,  S.  Lee, 
G.  Seidman,  M.  Stonebraker,  N.  Tatbul,  and  D.  Carney. 
Monitoring  streams  -  a  new  class  of  data  management  ap¬ 
plications.  In  VLDB  Conference,  2002. 


12 


